6 Total Reward Criteria
نویسنده
چکیده
This chapter deals with total reward criteria. We discuss the existence and structure of optimal and nearly optimal policies and the convergence of value iteration algorithms under the so-called General Convergence Condition. This condition assumes that, for any initial state and for any policy, the expected sum of positive parts of rewards is nite. Positive, negative, and discounted dynamic programming problems are special cases when the General Convergence Condition holds.
منابع مشابه
On the equivalence of two expected average reward criteria for zero-sum semi-Markov games
In this paper we study two basic optimality criteria used in the theory of zero-sum semi-Markov games. According to the first one, the average reward for player 1 is the lim sup of the expected total rewards over a finite number of jumps divided by the expected cumulative time of these jumps. According to the second definition, the average reward (for player 1) is the lim sup of the expected to...
متن کاملSemi-markov Decision Processes
Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.
متن کاملEffect of Reward Function Choices in MDPs with Value-at-Risk
This paper studies Value-at-Risk problems in finite-horizon Markov decision processes (MDPs) with finite state space and two forms of reward function. Firstly we study the effect of reward function on two criteria in a short-horizon MDP. Secondly, for long-horizon MDPs, we estimate the total reward distribution in a finite-horizon Markov chain (MC) with the help of spectral theory and the centr...
متن کاملExponential Lower Bounds for Policy Iteration
We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov decision processes with the total reward and average-reward optimality criteria.
متن کامل2 Finite State and Action Mdps
In this chapter we study Markov decision processes (MDPs) with nite state and action spaces. This is the classical theory developed since the end of the fties. We consider nite and in nite horizon models. For the nite horizon model the utility function of the total expected reward is commonly used. For the in nite horizon the utility function is less obvious. We consider several criteria: total...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001